Hopfield network of pattern recognition

Hopfield networks are a kind of recurrent neural network that model auto-associative memory: the ability to recall a memory from just a partial piece of that memory.


In [1]:
import os
import numbers
import numpy as np
import imageio
import matplotlib
from matplotlib import pyplot as plt
from scipy.misc import imread, imsave

%matplotlib inline

matplotlib.rcParams['figure.figsize'] = (20.0, 10.0)

np.random.seed(3)

Let's load in a meme. I'm partial to 'Deal with it'.


In [2]:
#deal = 2 * np.random.binomial(1,.5,size=(5,5)) - 1
#deal = imread('obama.png', mode="L")
deal = imread('small-deal-with-it-with-text.jpg', mode="L")
print(deal.shape)
deal = deal.astype(int)


(128, 128)

In [3]:
np.unique(deal)


Out[3]:
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 112, 113, 114, 115, 116, 117,
       118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130,
       131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
       144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156,
       157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169,
       170, 171, 172, 173, 174, 175, 177, 178, 179, 180, 181, 182, 183,
       184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196,
       197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
       210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222,
       223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235,
       236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248,
       249, 250, 251, 252, 253, 254, 255])

To convert this to a 1 bit image, I convert everything darker than some threshold to black (1), and everything else to white (-1). Experimenting a bit with the particular image of the 'deal with it meme' that I have, a threshold of 80 seemed to work reasonably. The resulting image is still a bit rough around the edges, but it's recognizable.


In [4]:
bvw_threshold = 80

deal[deal <= bvw_threshold] = -1
deal[deal >  bvw_threshold] = 1
deal = -deal
deal


Out[4]:
array([[-1, -1, -1, ..., -1, -1, -1],
       [-1, -1, -1, ..., -1, -1, -1],
       [-1, -1, -1, ..., -1, -1, -1],
       ..., 
       [-1, -1, -1, ..., -1, -1, -1],
       [-1, -1, -1, ..., -1, -1, -1],
       [-1, -1, -1, ..., -1, -1, -1]])

In [5]:
np.unique(deal)


Out[5]:
array([-1,  1])

In [6]:
plt.imshow(deal, cmap='Greys', interpolation='nearest');


Now train the weights. Whereas before we used Hebb's rule, now let's use the Storkey Learning Rule. This rule has a few nice advantages over Hebb's rule: it allows the network to learn more patterns (the 'capacity is n/sqrt(2*ln(n)) where n is the number of neurons in the network), its basins of attraction (to the stored patterns) are larger, the distribution of basin sizes is more even, and the shapes of the basins are more round. The weights at time v are:

where

and n is the number of neurons and $\epsilon$ is a bit (+1 or -1) of the pattern being trained at time v.

The second term of the rule is basically the Hebbian rule. The third and fourth terms basically account for the net input to neurons j and i using the current weights.

To see the development/testing of the below implementation of the Storkey rule, see 'Hopfield Network of memes--Storkey Learning Rule development'.


In [48]:
def storkey_rule(pattern, old_weights=None):
    """
    pattern: 2-dimensional array
    old_weights: square array of length pattern.shape[0]*pattern.shape[1]
    """
    
    mem = pattern.flatten()    
    n = len(mem)
    
    if old_weights is None:
        old_weights = np.zeros(shape=(n,n))

    hebbian_term  = np.outer(mem,mem)
    
    net_inputs = old_weights.dot(mem)
    net_inputs = np.tile(net_inputs, (n, 1)) # repeat the net_input vector n times along the rows 
                                             # so we now have a matrix
    
    # h_i and h_j should exclude input from i and j from h_ij
    h_i = np.diagonal(old_weights) * mem # this obtains the input each neuron receives from itself
    h_i = h_i[:, np.newaxis]             # turn h_i into a column vector so we can subtract from hij appropriately
    
    h_j = old_weights * mem              # element-wise multiply each row of old-weights by mem    
    np.fill_diagonal(h_j,0)              # now replace the diagonal of h_j with 0's; the diagonal of h_j is the 
                                         # self-inputs, which are redundant with h_i; np.fill_diagonal modifies inplace
    
    hij = net_inputs - h_i - h_j
    
    post_synaptic  = hij * mem
    #pre_synaptic = post_synaptic.T
    pre_synaptic   = hij.T * mem[:, np.newaxis]
        
    new_weights = old_weights + (1./n)*(hebbian_term - pre_synaptic - post_synaptic)
    
    return new_weights

This next cell can take a little while if the image is large. For an image of size 128x128, it takes a minute or two.


In [8]:
deal_weights = storkey_rule(deal, old_weights=None)
deal_weights


Out[8]:
array([[  6.10351562e-05,   6.10351562e-05,   6.10351562e-05, ...,
          6.10351562e-05,   6.10351562e-05,   6.10351562e-05],
       [  6.10351562e-05,   6.10351562e-05,   6.10351562e-05, ...,
          6.10351562e-05,   6.10351562e-05,   6.10351562e-05],
       [  6.10351562e-05,   6.10351562e-05,   6.10351562e-05, ...,
          6.10351562e-05,   6.10351562e-05,   6.10351562e-05],
       ..., 
       [  6.10351562e-05,   6.10351562e-05,   6.10351562e-05, ...,
          6.10351562e-05,   6.10351562e-05,   6.10351562e-05],
       [  6.10351562e-05,   6.10351562e-05,   6.10351562e-05, ...,
          6.10351562e-05,   6.10351562e-05,   6.10351562e-05],
       [  6.10351562e-05,   6.10351562e-05,   6.10351562e-05, ...,
          6.10351562e-05,   6.10351562e-05,   6.10351562e-05]])

Now start with a noisy version of the image. We'll just flip a certain number of random pixels on each row of the image.


In [9]:
def noisify(pattern, numb_flipped=30):

    noisy_pattern = pattern.copy()

    for idx, row in enumerate(noisy_pattern):
        choices = np.random.choice(range(len(row)), numb_flipped)
        noisy_pattern[idx,choices] = -noisy_pattern[idx,choices]
        
    return noisy_pattern

noisy_deal = noisify(pattern=deal)

In [10]:
plt.imshow(noisy_deal, cmap='Greys', interpolation='nearest');


Now we can start with that, and use the weights to update it. We'll update the units asynchronously (one at a time), and keep track of the energy of the network every so often.


In [46]:
def flow(pattern, weights, theta=0, steps = 50000):
    
    pattern_flat = pattern.flatten()

    if isinstance(theta, numbers.Number):
        thetas = np.zeros(len(pattern_flat)) + theta
    
    for step in range(steps):
        unit = np.random.randint(low=0, high=(len(pattern_flat)-1))
        unit_weights = weights[unit,:]
        net_input = np.dot(unit_weights,pattern_flat)
        pattern_flat[unit] = 1 if (net_input > thetas[unit]) else -1        
        #pattern_flat[unit] = np.sign(net_input)
        
        if (step % 10000) == 0:
            energy = -0.5*np.dot(np.dot(pattern_flat.T,weights),pattern_flat) + np.dot(thetas,pattern_flat)
            print("Energy at step {:05d} is now {}".format(step,energy))

    evolved_pattern = np.reshape(a=pattern_flat, newshape=(pattern.shape[0],pattern.shape[1]))
    
    return evolved_pattern

In [12]:
steps = 50000
theta = 0

noisy_deal_evolved = flow(noisy_deal, deal_weights, theta = theta, steps = steps)


Energy at step 00000 is now -2763.49658203125
Energy at step 10000 is now -4877.3243408203125
Energy at step 20000 is now -6212.64111328125
Energy at step 30000 is now -7071.2061767578125
Energy at step 40000 is now -7570.2667236328125

In [13]:
plt.imshow(noisy_deal_evolved, cmap='Greys', interpolation='nearest');


Voila.

Training the network on a second pattern

The cooler thing about the Hopfield networks is that they can encode multiple patterns (to a limit depending on the training regimen, and the number of units). So let's try another maymay.

I got the next meme from here, and then tweaked its levels in Mac's preview so that it'd translate nicely to a 1 bit (black or white) image.


In [14]:
woah = imread('woah.png', mode="L")
woah = woah.astype(int)
woah[woah >= 1] = 1
woah[woah < 1] = -1
woah = -woah

In [15]:
np.unique(woah)


Out[15]:
array([-1,  1])

In [16]:
plt.imshow(woah, cmap='Greys', interpolation='nearest');


Cool. So now we make some weights for this image. The takes a little bit longer than the Hebbian learning rule when it is dealing with previous, nonzero weights.


In [17]:
average_weights = storkey_rule(woah, old_weights=deal_weights)

Now, let's make a noisy Neil deGrasse Tyson, and have the network try to recover the clean, pristine NGT.


In [38]:
noisy_woah = noisify(pattern=woah, numb_flipped=15)
        
plt.imshow(noisy_woah, cmap='Greys', interpolation='nearest');



In [19]:
recovered_woah = flow(noisy_woah, average_weights, theta = theta, steps = steps)

plt.imshow(recovered_woah, cmap='Greys', interpolation='nearest');


Energy at step 00000 is now -2064.7373212724924
Energy at step 10000 is now -3646.2370708733797
Energy at step 20000 is now -4658.140086367726
Energy at step 30000 is now -5324.9750313311815
Energy at step 40000 is now -5672.545293405652

Now let's doublecheck that the average weights also still work for the 'deal with it' image.


In [20]:
deal_recovered = flow(noisy_deal, average_weights, theta = theta, steps = steps)

plt.imshow(deal_recovered, cmap='Greys', interpolation='nearest');


Energy at step 00000 is now -2055.6658030301332
Energy at step 10000 is now -3673.512946680188
Energy at step 20000 is now -4735.080832019448
Energy at step 30000 is now -5355.875614479184
Energy at step 40000 is now -5661.537752583623

Sweet. So now we can try something like feeding it a pattern that is halfway between the two patterns -- it should eventually settle into one of them! Who has greater meme strength!??!


In [21]:
deal_with_neil = (woah + deal) / 2
print(np.unique(deal_with_neil))


[-1.  0.  1.]

I could force those 0 values to -1 or 1, but that biases the pattern towards deal and neil, respectively (at least, testing suggested this -- I think because Neil has more black pixels and Deal has more white pixels). So, I'll leave them in. I could probably solve this by randomly setting 0's to 1 or -1, but naw.


In [22]:
#deal_with_neil[deal_with_neil == 0] = -1
#np.unique(deal_with_neil)

In [23]:
plt.imshow(deal_with_neil, cmap='Greys', interpolation='nearest');



In [24]:
recovered_deal_with_neil = flow(deal_with_neil, average_weights, theta = theta, steps = steps)

plt.imshow(recovered_deal_with_neil, cmap='Greys', interpolation='nearest');


Energy at step 00000 is now -5050.801895905286
Energy at step 10000 is now -5263.417067334056
Energy at step 20000 is now -5565.849911440164
Energy at step 30000 is now -5779.309322502464
Energy at step 40000 is now -5913.403473135084

Assuming the cells/pixels of 0 were unaltered, if you run that a few times, you'll notice that sometimes it settles on Neil, and sometimes it settles on Deal!!!

Spurious patterns

Hopfield networks can also settle onto 'spurious patterns' (patterns that the network wasn't trained on). For each stored pattern x, -x is a spurious pattern. But also, any linear combination of the of the learned patterns can be a spurious pattern. So let's learn a third pattern, and then see the network stabilize on a simple combination of the three patterns.


In [25]:
butt = imread('dick_butt.png', mode="L")
butt = butt.astype(int)
butt[butt >= 1] = 1
butt[butt < 1] = -1

plt.imshow(butt, cmap='Greys', interpolation='nearest');



In [26]:
average_weights = storkey_rule(butt, old_weights=average_weights)
average_weights


Out[26]:
array([[  6.92501671e-05,   6.92648828e-05,   6.92648828e-05, ...,
          6.92648828e-05,   6.92648828e-05,   6.92648828e-05],
       [  6.92648828e-05,   6.92501671e-05,   6.92648828e-05, ...,
          6.92648828e-05,   6.92648828e-05,   6.92648828e-05],
       [  6.92648828e-05,   6.92648828e-05,   6.92501671e-05, ...,
          6.92648828e-05,   6.92648828e-05,   6.92648828e-05],
       ..., 
       [  6.92648828e-05,   6.92648828e-05,   6.92648828e-05, ...,
          6.92501671e-05,   6.92648828e-05,   6.92648828e-05],
       [  6.92648828e-05,   6.92648828e-05,   6.92648828e-05, ...,
          6.92648828e-05,   6.92501671e-05,   6.92648828e-05],
       [  6.92648828e-05,   6.92648828e-05,   6.92648828e-05, ...,
          6.92648828e-05,   6.92648828e-05,   6.92501671e-05]])

In [27]:
noisy_butt = noisify(pattern=butt)
        
plt.imshow(noisy_butt, cmap='Greys', interpolation='nearest');



In [28]:
recovered_butt = flow(noisy_butt, average_weights, theta=theta, steps=steps)

plt.imshow(recovered_butt, cmap='Greys', interpolation='nearest');


Energy at step 00000 is now -2564.9289000578847
Energy at step 10000 is now -4556.647967345483
Energy at step 20000 is now -5831.266479625718
Energy at step 30000 is now -6612.842376593117
Energy at step 40000 is now -7008.175704851563

In [41]:
plt.imshow(woah, cmap='Greys', interpolation='nearest');



In [47]:
np.random.seed(10)
recovered_woah = flow(woah, average_weights, theta=theta, steps=70000)

plt.imshow(recovered_woah, cmap='Greys', interpolation='nearest');


Energy at step 00000 is now -5642.471253915535
Energy at step 10000 is now -5775.09856626564
Energy at step 20000 is now -5872.835127769589
Energy at step 30000 is now -5934.056589333968
Energy at step 40000 is now -5970.790708968796
Energy at step 50000 is now -6005.821960332049
Energy at step 60000 is now -6025.7535290914475

I don't understand why it settles on that...even if given the 'woah' image it was trained on, it evolves towards that pattern....

Okay, now let's make a spurious pattern. Any linear combination will do.


In [30]:
spurious_meme = butt + deal + woah
np.unique(spurious_meme)


Out[30]:
array([-3, -1,  1,  3])

In [31]:
spurious_meme[spurious_meme > 0] = 1
spurious_meme[spurious_meme < 0] = -1

In [32]:
plt.imshow(spurious_meme, cmap='Greys', interpolation='nearest');


Pretty noisy. Only Neal, and kiiiiinda the Deal with It, are visible. Now make a noisy version of that combination.


In [33]:
noisy_spurious_meme = noisify(pattern=spurious_meme)
        
plt.imshow(noisy_spurious_meme, cmap='Greys', interpolation='nearest');


Beautifully noisy. Can barely see anything in it. But now if we start with that, and apply the weights, it should recover the spurious pattern!


In [34]:
steps = 100000

recovered_spurious_meme = flow(noisy_spurious_meme, average_weights, theta=theta, steps=steps)

plt.imshow(recovered_spurious_meme, cmap='Greys', interpolation='nearest');


Energy at step 00000 is now -2214.0117660861233
Energy at step 10000 is now -3868.3670408007183
Energy at step 20000 is now -4943.118399732664
Energy at step 30000 is now -5595.730897042164
Energy at step 40000 is now -5962.680252565473
Energy at step 50000 is now -6233.365584154935
Energy at step 60000 is now -6332.204196493278
Energy at step 70000 is now -6374.972476783116
Energy at step 80000 is now -6432.369842451006
Energy at step 90000 is now -6449.6263557140755

And it sure as heck did.

Properties of the Storkey Learning Rule


In [ ]: